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Schedule for Today 


* 9am to 1020am: Introduction; Overview of Inherently Interpretable Models 

* 1020amto 1040am: Break 

* 1040amto 12pm: Overview of Post hoc Explanation Methods 

e 12pm to 1pm: Lunch 

e 105pm to 125pm: Breakout Groups 

e 125pm to 245pm: Evaluating and Analyzing Model Interpretations and Explanations 
* 245pm to 3pm: Break 


* 3pmto 4pm: Analyzing Model Interpretations and Explanations, and Future Research Directions 
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[ Weller 2017 ] 


Is Model Understanding Needed Everywhere? 


Amazon.com: Bestselling Canon Cameras 


THE 
CRAWDADS 


Amazon.com to me show details May 30 (9 days ago) | “Reply | v 


Find Friends * Settings 


amazon.com More to Explore 


Customers who have shown an interest in point-and-shoot cameras might like to see 
this week's bestselling models. 


Canon PowerShot Canon PowerShot Canon PowerShot Canon PowerShot 


People You May Know A495 10.0 MP A3000IS 10 MP  ELPH 300 HS 12 S95 10 MP Digital 
Digital Camera Digital Camera MP CMOS Digital Camera with 3.8x 
' — = with 3.3x Optical with 4x Optical Camera with Full Wide Angle 
å | *L Add Friend | Zoom and 2.5- Image Stabilized 1080p HD Video Optical Image 
JE Inch LCD (Blue) Zoom and 2.7- (Black) Stabilized Zoom 
Inch LCD and 3.0-Inch inch 
See All E 


[ Weller 2017, Lipton 2017, Doshi-Velez and Kim 2016 ] 


When and Why Model Understanding? 


- Not all applications require model understanding 
* E.g, ad/product/friend recommendations 
* No human intervention 


- Model understanding not needed because: 
e Little to no consequences for incorrect predictions 


* Problem is well studied and models are extensively validated in 
real-world applications || trust model predictions 


When and Why Model Understanding? 


ML is increasingly being employed in complex high-stakes settings 


When and Why Model Understanding? 


- High-stakes decision-making settings 
- Impact on human lives/health/finances 
- Settings relatively less well studied, models not extensively validated 


- Accuracy alone is no longer enough 


- Train/test data may not be representative of data encountered in 
practice 


- Auxiliary criteria are also critical: 
- Nondiscrimination 
- Right to explanation 
- Safety 


When and Why Model Understanding? 


- Auxiliary criteria are often hard to quantify (completely) 


* E.g: Impossible to predict/enumerate all scenarios violating safety of an 
autonomous car 


- Incompleteness in problem formalization 
- Hinders optimization and evaluation 
* [ncompleteness + Uncertainty; Uncertainty can be quantified 


When and Why Model Understanding? 


Model understanding becomes critical when: 


models not extensively validated in applications; 
train/test data not representative of real time data 


key criteria are hard to quantify, and we need to rely on 
a "you will know it when you see it" approach 


Example: Why Model Understanding? 


Prediction = Siberian Husky 


Predictive 
Model x 


[ Larson et. al. 2016 | 


Example: Why Model Understanding? 


Predictive — —— — ——7 Prediction = Risky to Release 
Model ! 


Example: Why Model Understanding? 


Loan Applicant Details I have some means 


Model understanding helps provide recourse to individuals 
who are adversely affected by model predictions. 


Predictive |S Prediction = Denied Loan l 
Model 


Loan Applicant 


Example: Why Model Understanding? 


Patient Data Model Understanding 


This model is using 
irrelevant features when 
on female 

. I should 
edictions 


31, Male, 


Model understanding helps assess if and when to trust 
model predictions when making decisions. 


Healthy 
Sick 


Predictive s 
ea 
Model | / 
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Example: Why Model Understanding? 


Patient Data Model Understanding 


This model is using 


31, Male, : 
| Model understanding allows us to vet models to determine 


if they are suitable for deployment in real world. 


Sick 
Sick 


Healthy 
Healthy 
Sick 


Predictive 
Model | 


AUTHORITY 
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summary: Why Model Understanding? 


Utility 


Debugging 

Bias Detection 

Recourse 

If and when to trust model predictions 


Vet models to assess suitability for 
deployment 


Stakeholders 


End users (e.g., loan applicants) 
Decision makers (e.g., doctors, judges) 


Regulatory agencies (e.g., FDA, European 
commission) 


Researchers and engineers 
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[ Letham and Rudin 2015; Lakkaraju et. al. 2016 | 


Achieving Model Understanding 


Take 1: Build inherently interpretable predictive models 


Tear production rate 


if (age = 18 — 20) and (sex = male) then predict yes 

else if (age = 21 — 23) and (priors = 2 — 3) then predict yes 
else if (priors > 3) then predict yes 

else predict no 
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[ Ribeiro et. al. 2016, Ribeiro et al. 2018; Lakkaraju et. al. 2019] 


Achieving Model Understanding 


Take 2: Explain pre-built models in a post-hoc manner 


—— Explainer — E 


if (age = 18 — 20) and (sex = male) then predict yes 
else if (age — 21 — 23) and (priors — 2 — 3) then predict yes 
else if (priors » 3) then predict yes 


else predict no 
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[ Ciresan et. al. 2012, Caruana et. al. 2006, Frosst et. al. 2017, Stewart 2020 | 
Inherently Interpretable Models vs. 
Post hoc Explanations 


Example 
o @ Linear Regression 
o o 
@ Decision Trees 
@ 
& & 
— eo Interpret- 
ability e ability Random Forests @ 


Neural Networks @ 


Accuracy 


Accuracy 


In certain settings, accuracy-interpretability trade offs may exist. 


Inherently Interpretable Models vs. 
Post hoc Explanations 


8 10 


complex models might 
achieve higher accuracy 


can build interpretable + 
accurate models 


Inherently Interpretable Models vs. 
Post hoc Explanations 


Sometimes, you don't have enough data to build your model from scratch. 


And, all you have is a (proprietary) black box! 


um] 


Inherently Interpretable Models vs. 
Post hoc Explanations 


If you can build an interpretable model which is also adequately 
accurate for your setting, DO IT! 


Otherwise, post hoc explanations come to the rescue! 


Agenda 

Inherently Interpretable Models 

Post hoc Explanation Methods 

Evaluating Model Interpretations/Explanations 

Empirically & Theoretically Analyzing Interpretations/Explanations 


Future of Model Understanding 


Agenda 

Inherently Interpretable Models 

Post hoc Explanation Methods 

Evaluating Model Interpretations/Explanations 

Empirically & Theoretically Analyzing Interpretations/Explanations 


Future of Model Understanding 


Inherently Interpretable Models 


- Rule Based Models 

- Risk Scores 

- Generalized Additive Models 
- Prototype Based Models 

- Attention Based Models 


Inherently Interpretable Models 


- Rule Based Models 

- Risk Scores 

- Generalized Additive Models 
- Prototype Based Models 

- Attention Based Models 


[Letham et. al. 2016] 


Bayesian Rule Lists 


- Arule list classifier for stroke prediction 


if hemiplegia and age > 60 then stroke risk 58.9% (53.8%-63.8%) 

else if cerebrovascular disorder then stroke risk 47.8% (44.8%-50.7%) 
else if transient ischaemic attack then stroke risk 23.8% (19.596—28.490) 
else if occlusion and stenosis of carotid artery without infarction then 


stroke risk 15.8% (12.2%-19.6%) 

else if altered state of consciousness and age > 60 then stroke risk 
16.0% (12.2%-—20.2%) 

else if age < 70 then stroke risk 4.6% (3.996—5.496) 

else stroke risk 8.7% (7.9%-9.6%) 


Bayesian Rule Lists 


- A generative model designed to produce rule lists (if/else-if) that 
strike a balance between accuracy, interpretability, and 
computation 


- What about using other similar models? 
* Decision trees (CART, C5.0 etc.) 
* They employ greedy construction methods 


* Notcomputationally demanding but affects quality of solution - both 
accuracy and interpretability 
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Bayesian Rule Lists: Generative Model 


— Sample a decision list length m ~ p(m|A). 
— Sample the default rule parameter Ø9 ~ Dirichlet (a). 
— For decision list rule j = 1,...,m: 
Sample the cardinality of antecedent a; in d as c; ~ p(c;|c«;, A,n). 
Sample a; of cardinality c; from p(aj|a<j, Cj, A). 
Sample rule consequent parameter 0; ~ Dirichlet(a). 
— For observation i= 1,...,n: 
Find the antecedent a; in d that is the first that applies to z;. 
If no antecedents in d apply, set 7 = 0. 
Sample y; ~ Multinomial(0;). 


A is a set of pre-mined antecedents 


Model parameters are inferred using the Metropolis-Hastings algorithm which is a 
Markov Chain Monte Carlo (MCMC) Sampling method 


Pre-mined Antecedents 


- A major source of practical feasibility: pre-mined antecedents 
* Reduces model space 
* Complexity of problem depends on number of pre-mined antecedents 


- As long as pre-mined set is expressive, accurate decision list can 
be found + smaller model space means better generalization 
(Vapnik, 1995) 
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[Lakkaraju et. al. 2016] 


Interpretable Decision Sets 


- A decision set classifier for disease diagnosis 


If Respiratory-IlIness— Yes and Smoker= Yes and Age > 50 then Lung Cancer 


If Risk-LungCancer= Yes and Blood-Pressure > 0.3 then Lung Cancer 


If Risk-Depression— Yes and Past-Depression— Yes then Depression 


If BMI > 0.3 and Insurance=None and Blood-Pressure > 0.2 then Depression 
If Smoker— Yes and BMI > 0.2 and Age > 60 then Diabetes 
If Risk-Diabetes= Yes and BMI > 0.4 and Prob-Infections- 0.2 then Diabetes 


If Doctor-Visits > 0.4 and Childhood-Obesity— Yes then Diabetes 


Interpretable Decision Sets: Desiderata 


- Optimize for the following criteria 
* Recall 
* Precision 
* Distinctness 
* Parsimony 
- Class Coverage 


- Recall and Precision || Accurate predications 


- Distinctness, Parsimony, and Class Coverage | Interpretability 


IDS: Objective Function 


Number of rules 


* Parsimony 
* Fewer rules: fi (R) = |S| — size(R) 
Number of conditions in the rule 


: no jiu 
= Lmax: -|S| — » length (r 


E | 
Maximum no. of Total number of input patterns 


conditions in any given 
Input pattern 
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IDS: Objective Function 


* Distinctness 
= Intra-class overlap: 


f3(R) 7 |S— 5, ovelap(ri rj) 


Total number of data points 14 
Ci=Cj Number of points that 


satisfy both the rules 
= Inter-class overlap: 


FAR) = NSP- Y overlap(r rj) 
rir;€k 
i<j 
CFC; 
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IDS: Objective Function 


= Class Coverage 


RR) = 3 1 (Ar = (s, c) € R such that c — c") 
cec 


Check if there exists some rule 
corresponding to a given class c 


IDS: Objective Function 


= Precision 
* Minimize "incorrect" covers: 


FeR) = N-|S|— >. |incorrect-cover(r)| 


rER 
Given a ruler = (s,c), the no. 


of data points which satisfy s 
but do not belong to class c. 
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IDS: Objective Function 


* Recall 
= Encourage at least one "correct" cover per data 
point: 
FR) = D 1 (\{r|(x, y) € correct-cover(r) || > 1) 


(x,y)ED 


Given a rule r = (s,c), the no. 
of data points which satisfy s 
and belong to class c. 
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IDS: Objective Function 


* Complete objective is 


argmax X` Ajfi(R) 
Riese i 


The intra-class and inter-class overlap terms are 
non-monotone 


The parsimony, overlap, and precision terms are 
non-normal 


All the component terms are submodular 
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IDS: Optimization Procedure 


- The problem is a non-normal, non-monotone, submodular 
optimization problem 


- Maximizing a non-monotone submodular function is NP-hard 


- Local search method which iteratively adds and removes 
elements until convergence 
* Provides a 2/5 approximation 


Inherently Interpretable Models 


- Rule Based Models 

- Generalized Additive Models 
- Prototype Based Models 

- Attention Based Models 


Risk Scores: Motivation 


- Risk scores are widely used in medicine and criminal justice 
* E.g., assess risk of mortality in ICU, assess the risk of recidivism 


- Adoption 


decision makers find them easy to understand 


- Until very recently, risk scores were constructed manually by 
domain experts. Can we learn these in a data-driven fashion? 


[Ustun and Rudin, 2016] 
Risk Scores: Examples 


Prior Arrests 2 2 
å AE a . Prior Arrests > 5 
° Re Cl d IVISM . Prior Arrests for Local Ordinance 
Age at Release between 18 to 24 
Age at Release 2 40 


ADD POINTS FROM ROWS 1-5 SCORE |= ~~ | 


SCORE | 3 | o | i | 2 | 3 | 4 | 
RISK | 119% | 269% | 500% | 73.1% | 88.1% | 953% | 


. Call between January and March 
° L oan D e fault . Called Previously 
Previous Call was Successful 
4. Employment Indicator « 5100 
5. 3 Month Euribor Rate 2 100 


ADD POINTS FROM ROWS 1-5 SCORE|- --- | 
Lo JE | 


+2 | 3_| 4 
| RISK | 47% | 11.9% | 26.9% | 50.0% | 73.1% | 88.1% | 
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Objective function to learn risk scores 


Definition 1 (Risk Score Problem, RISKSLIMMINLP) 
The risk score problem is a discrete optimization problem with the form: 


min 1(A) + Op |All 
at. AEL, 


(1) 


where: 
* HA) = 1 $72 log(1 + exp(—(A, yiai))) is the normalized logistic loss function; 
e [Allg = M 1 [A; Å 0] is the L9-seminorm; 
~ L CZ! is a set of feasible coefficient vectors (user-provided); 


* Co > 0 is a trade-off parameter to balance fit and sparsity (user-provided). 


Above turns out to be a mixed integer program, and is optimized using a cutting plane 
method and a branch-and-bound technique. 


Inherently Interpretable Models 


- Rule Based Models 

- Risk Scores 

- Prototype Based Models 
- Attention Based Models 


[Lou et. al., 2012; Caruana et. al., 2015] 


Generalized Additive Models (GAMs) 


| | 
0.44 —- 
02 
0 d gs 
0.0 4- \ 
-0.2 - 
i uw" 


00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
Hour 


Demand 


Temperature (Celsius) 
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Formulation and Characteristics of GAMs 


Linear Model y= Bo + B121 ^ ... + Bntn 
Generalized Linear Model | g(y) = Bo + 8121 + ... + Bn Zn 
Additive Model y = fili) +t faltan) 


Intelligibility 


Generalized Additive Model | g(y) = fi(zi) +... + fn(£n) 
Full Complexity Model V= fim css) 


g is a link function; E.g., identity function in case of regression; 
log (y/1 — y) in case of classification; 


f is a shape function 
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GAMs and GA^Ms 


. While GAMs model first order terms, GA*Ms model second order 
feature interactions as well. 


GAMs: g(y) = Bo + th fj(2) 


GA*Ms: g(y) = Be f(x) fag 825) 
izj 


GAMs and GA^Ms 


- Learning: 
* Represent each component as a spline 


* Leastsquares formulation; Optimization problem to balance 
smoothness and empirical error 


- GA*Ms: Build GAM first and and then detect and rank all possible 
pairs of interactions in the residual 
* Choose top k pairs 
* k determined by CV 


Inherently Interpretable Models 


- Rule Based Models 
- Risk Scores 
- Generalized Additive Models 


- Attention Based Models 


[Bien et. al., 2012] 


Prototype Selection for Interpretable 
Classification 


The goal here is to identify K prototypes (instances) from the 
data s.t. a new instance which will be assigned the same label as 
the closest prototype will be correctly classified (with a high 
probability) 


- Let each instance "cover" the € - neighborhood around it. 


- Once we define the neighborhood covered by each instance, this 
problem becomes similar to the problem of finding rule sets, and 
can be solved analogously. 


Prototype Selection for Interpretable 
Classification 


Given a value for £, the choice of P1,..., Pr induces L partial covers of the train- 
ing points by &-balls. Here £ is varied from the smallest (top-left panel) to approximately 
the median interpoint distance (bottom-right panel). 


[Li et. al. 2017, Chen et. al. 2019] 


Prototype Layers in Deep Learning Models 


prototype classifier network h 
prototype fully-connected softmax 


input encoder 
z network 


reconstructed 
input 


(g » f)@) 


decoder 
network 
g 


Prototype Layers in Deep Learning Models 


prototype classifier network h 


prototype fully-connected | softmax 
layer s 


Prototype Layers in Deep Learning Models 


prototype classifier network h 


prototype fully-connected softmax 


Prototype layer is responsible for computing the prototypes 
: T 
z—f(x) p(z)= |z- pil lz- l2. - llz— P»ll2] 


Each node in layer p computes one of the above elements 


Prototype Layers in Deep Learning Models 


prototype classifier network h 


= The fully connected layer computes weighted 
sums of the distances ||z - p;l2 : Wp(z) 


= W is a kx m matrix 


Prototype Layers in Deep Learning Models 


prototype classifier network h 
prototype fully-connected softmax 
s 


Re. layer w layer 
S e 


output of 
prototype 
classifi 
network 
nstructed (he f)(x) 
input 
(g * HE) 


The weighted sums Wp(z) are normalized by the softmax layer to output 
a probability distribution over K classes 


Inherently Interpretable Models 


- Rule Based Models 

- Risk Scores 

- Generalized Additive Models 
- Prototype Based Models 


[Bahdanau et. al. 2016; Xu et. al. 2015] 


Attention Layers in Deep Learning Models 


- Let us consider the example of machine translation 


Input Encoder Context Vector Decoder Outputs 


Je 


(— EM — 
ES 


OP 


S 


[Bahdanau et. al. 2016; Xu et. al. 2015] 


Attention Layers in Deep Learning Models 


- Let us consider the example of machine translation 


Input Encoder Context Vector Decoder Outputs 


Bob 


Attention Layers in Deep Learning Models 


- Context vector corresponding to s, can be written as follows: 


Ó 
E; — ajjh, 
j=l 


Ai} 
captures the attention placed on input token j when 


determining the decoder hidden state s; it can be computed as a 
softmax of the "match" between s, , and h 


Inherently Interpretable Models 


- Rule Based Models 

- Risk Scores 

- Generalized Additive Models 
- Prototype Based Models 

- Attention Based Models 


Agenda 

Inherently Interpretable Models 

Post hoc Explanation Methods 

Evaluating Model Interpretations/Explanations 

Empirically & Theoretically Analyzing Interpretations/Explanations 


Future of Model Understanding 


What is an Explanation? 


What is an Explanation? 


Definition: Interpretable description of the model behavior 


Classifier 


= 
n 
(D 
= 


Faithful Understandable 


[ Lipton 2016 | 


What is an Explanation? 


Definition: Interpretable description of the model behavior 


Classifier 


Describe how to flip the model prediction 


Local Explanations vs. Global Explanations 


Explain individual predictions Explain complete behavior of the model 


Help unearth biases in the local Help shed light on big picture biases 


neighborhood of a given instance affecting larger subgroups 


Help vet if individual predictions are Help vet if the model, at a high level, is 
being made for the right reasons suitable for deployment 
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Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
- Feature Importances - Collection of Local Explanations 
- Rule Based - Representation Based 
- Saliency Maps - Model Distillation 
- Prototypes/Example Based - Summaries of Counterfactuals 


- Counterfactuals 


Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
Feature Ir tances - Collection of Local Explanations 
- Rule Based - Representation Based 
- Saliency Maps - Model Distillation 
- Prototypes/Example Based - Summaries of Counterfactuals 


- Counterfactuals 


LIME: Local Interpretable Model-Agnostic 
Explanations 


1. Sample points around x, 


[ Ribeiro et al. 2016 | 


LIME: Local Interpretable Model-Agnostic 
Explanations 


1. Sample points around x, 


2. Use model to predict labels for each sample o6 


LIME: Local Interpretable Model-Agnostic 
Explanations 


1. Sample points around x, 
2. Use model to predict labels for each sample © 


3. Weigh samples according to distance to x, "n 


LIME: Local Interpretable Model-Agnostic 


Explanations 
I 
I 
1. Sample points around x, ne 
2. Use model to predict labels for each sample "og 
+19 
3. Weigh samples according to distance to x, +,10 
4 Learn simple linear model on weighted he e 
samples le o° 


LIME: Local Interpretable Model-Agnostic 
Explanations 


1. Sample points around x, ne 
2. Use model to predict labels for each sample : I i 
+19 
3. Weigh samples according to distance to x, +,10 
4. Learn simple linear model on weighted fh © . + 
samples P e? Å 
5. Use simple linear model to explain I . 


[ Ribeiro et al. 2016 ] 


Predict Wolf vs Husky 


Predicted: wolf Predicted: husky Predicted: wolf 
True: wolf True: husky True: wolf 


Only 1 mistake! 


Predicted: wolf Predicted: husky Predicted: wolf 
True: husky True: husky True: wolf 
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[ Ribeiro et al. 2016 ] 


Predict Wolf vs Husky 


Predicted: husky 
True: husky 


Predicted: husky 
True: husky 


SHAP: Shapley Values as Importance 


Marginal contribution of each feature towards the prediction, 
averaged over all possible permutations. 


X. 
i 


" M(x, 0) = 0.1 


i 


Attributes the prediction to each of the features. 


Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
- Feature Importances - Collection of Local Explanations 
Rule Based - Representation Based 
- Saliency Maps - Model Distillation 
- Prototypes/Example Based - Summaries of Counterfactuals 


- Counterfactuals 


[ Ribeiro et al. 2018 ] 


Anchors 


* Perturb a given instance x to generate local 
neighborhood 


* [dentify an "anchor" rule which has the maximum 
coverage of the local neighborhood and also achieves a 
high precision. 


Salary Prediction 


28 « Age < 37 
Workclass = Private 
Education = High School grad 
Marital Status = Married 
Occupation = Blue-Collar 
Relationship = Husband 
Race = White 
Sex = Male 
Capital Gain = None 
Capital Loss = Low 
Hours per week < 40.00 
Country = United-States 


P(Salary > $50K) = 0.57 


(a) Instance and prediction 


Less than $50K More than $50F 
Married 


Capital Gain = None 

0.23 

Hours per week <= 40 
0.16 

Occupation = Blue Collar 
0.15 

Ed = High School grad 
0.10 


(b) LIME explanation 


[ Ribeiro et al. 2018 ] 


IF Country = United-States AND Capital Loss = Low 
AND Race = White AND Relationship = Husband 
AND Married AND 28 « Age < 37 


AND Sex - Male AND High School grad 
AND Occupation = Blue-Collar 
THEN PREDICT Salary > $50K 


(c) An anchor explanation 


Approaches for Post hoc Explainability 


Local Explanations Global Explanations 

: Feature Importances - Collection of Local Explanations 
- Rule Based - Representation Based 

+ SaliencyMaps — - Model Distillation 

- Prototypes/Example Based - Summaries of Counterfactuals 

- Counterfactuals 


Saliency Map Overview 


Input Model Predictions 


ml mf Junco Bird 
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Saliency Map Overview 


Input Model Predictions 


mf Junco Bird 


What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’? 


81 


Saliency Map Overview 


Input Model Predictions 


mf Junco Bird 


p" 
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Modern DNN Setting 


Input Model Predictions 


mf Junco Bird 


class specific logit 


Input-Gradient 


Input Model Predictions 


Input-Gradient 


Va " å T) — C R? 
| RUN 


the input. 


Baehrens et. al. 2010; Simonyan et. al. 2014. 84 


Input-Gradient 


Input Model Predictions 


mf Junco Bird 


Visualize as a heatmap 


Baehrens et. al. 2010; Simonyan et. al. 2014. 85 


Input-Gradient 


Input Model Predictions 


mf Junco Bird 


Challenges 
e Visually noisy & difficult to 
interpret. 
e 'Gradient saturation. 


Shrikumar et. al. 2017. 


Baehrens et. al. 2010; Simonyan et. al. 2014. 86 


SmoothGrad 


Input Model Predictions 


mf Junco Bird 


' 


SmoothGrad 


Average Input-gradient of 


N 
1 ‘noisy’ inputs. 
ND Veo Fia +e) di 


Gaussian noise 


Smilkov et. al. 2017 87 


SmoothGrad 


Input Model Predictions 


mf Junco Bird 


SmoothGrad 


N 
1 
x y Vaio Fi(z + €) 


A 


Gaussian noise 


Average Input-gradient of 
‘noisy’ inputs. 


Smilkov et. al. 2017 88 


Integrated Gradients 


Input Model Predictions 


mf Junco Bird 


Path integral: 'sum' of interpolated 
gradients 


Baseline input 


Sundararajan et. al. 2017 99 


Integrated Gradients 


Input Model Predictions 


mf Junco Bird 


Path integral: 'sum' of interpolated 
gradients 


[ OF (% +a x (x —#)) 


—Q0 Ox 


Baseline input 


Sundararajan et. al. 2017 90 


Gradient-Input 


Input Model Predictions 


mf Junco Bird 


' 


Gradient-Input 
VF) Ox 


x 


Input 


Element-wise product of 
input-gradient and input. 


Input gradient 


Shrikumar et. al. 2017, Ancona et. al. 2018. 91 


Gradient-Input 


Input Model Predictions 


mf Junco Bird 


Gradient-Input dez. 228 
XT Element-wise product of 


Vz F (x) Of | 2 | "ue 1j input-gradient and input. 
n E " " i 
x EN c2 


logit gradient dis T * 


Shrikumar et. al. 2017, Ancona et. al. 2018. 92 


Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
- Feature Importances - Collection of Local Explanations 
- Rule Based - Representation Based 


- Saliency Maps - Model Distillation 
Prototypes /Example Based - Summaries of Counterfactuals 


- Counterfactuals 


Prototypes/Example Based Post hoc Explanations 


Use examples (synthetic or natural) to explain individual predictions 


€ Influence Functions (Koh & Liang 2017) 
e Identify instances in the training set that are responsible for the prediction of a 
given test instance 


€ Activation Maximization (Erhan et al. 2009) 
e Identify examples (synthetic or natural) that strongly activate a function (neuron) 
of interest 
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Training Point Ranking via Influence Functions 


Input Model Predictions 


mf Junco Bird 


Which training data points have the most 'influence' on the test loss? 
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Training Point Ranking via 


Input Model Predictions 


Junco Bird 


Training Point Ranking via Influence Functions 


Influence Function: classic tool used in robust statistics for assessing 
the effect of a sample on regression parameters (Cook & Weisberg, 1980). 


Instead of refitting model for every data point, Cook's distance provides analytical 
alternative. 


Training Point Ranking via Influence Functions 
Koh & Liang (2017) extend the 'Cook's distance' insight to modern machine learning setting. 


Zi = (4 Yi) C Xx x y Zj — Cr Yj) dj Training sample point test 


Koh & Liang 2017 : 
AAA 


Training Point Ranking via Influence Functions 
Koh & Liang (2017) extend the 'Cook's distance' insight to modern machine learning setting. 
zi = (tyi) EXXVY w= (Lj, Yj) fem training sample point Ztest 
ERM Solution UpWeighted ERM Solution 


n 1 ^ l 
0 :— arg mingee : y £(zi;9) Oez; = arg mingco 3X. 2,,0) + el(2;;0) €= = 
i=1 
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Training Point Ranking via Influence Functions 
Koh & Liang (2017) extend the 'Cook's distance' insight to modern machine learning setting. 
Zi = (fi; Yi) EXX y ej = (Tj, Yj) q- i IRAM “test 
ERM SEDEM UpWeighted ERM SONALI 
 .— arg mingco - Y (26) M :— arg miNgce 3X. 2,,0) + el(2;;0) €= -> 
Influence of Training Point on Parameters 


den, 
M = 
Í de e=0 


= —-H; Vet(z;, 8) 


Influence of Training Point on Test-Input's loss 


Lz, test loss = — V 6l(Ztest; 0) H5! Vof(z;, 0) 
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Training Point Ranking via Influence Functions 
Applications: 


* compute self-influence to identify mislabelled 
examples; 


* diagnose possible domain mismatch; 


* craft training-time poisoning examples. 


[ Koh & Liang 2017 ] 


Challenges and Other Approaches 
Influence function Challenges: 


1. scalability: computing hessian-vector products can be tedious in 
practice. 


2. non-convexity: possibly loose approximation for deeper networks 
(Basu et. al. 2020). 


Challenges and Other Approaches 
Influence function Challenges: 
1. scalability: computing hessian-vector products can be tedious in 


practice. 


2. non-convexity: possibly loose approximation for 'deeper' networks 
(Basu et. al. 2020). 


Alternatives: 
* Representer Points (Yeh et. al. 2018). 


e TracIn (Pruthi et. al. appearing at NeuRIPs 2020). 


Activation Maximization 


These approaches identify examples, synthetic or natural, that 
strongly activate a function (neuron) of interest. 


Activation Maximization 


These approaches identify examples, synthetic or natural, that 
strongly activate a function (neuron) of interest. 


Implementation Flavors: 
e Search for natural examples within a specified set 
(training or validation corpus) that strongly activate a 


neuron of interest; 


e Synthesize examples, typically via gradient descent, 
that strongly activate a neuron of interest. 


Feature Visualization 


Dataset Examples show 


us what neurons 3 


= 
sone? 


respond to in practice 


Optimization isolates 
the causes of behavior 
from mere correlations. 
A neuron may not be 
detecting what you 
initially thought. 


Baseball—or stripes? Animal faces—or snouts? Clouds—or fluffiness? Buildings—or sky? 
mixed4a, Unit 6 mixed4a, Unit 240 mixed4a, Unit 453 mixed4a, Unit 492 
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Approaches for Post hoc Explainability 


Local Explanations Global Explanations 

- Feature Importances - Collection of Local Explanations 
- Rule Based - Representation Based 

- Saliency Maps - Model Distillation 


- Prototypes/Example Based - Summaries of Counterfactuals 


Counterfactual Explanations 


What features need to be changed and by how much to flip a model's prediction? 


- onc CP pr x 
"E! =, eg fi < 
MNA 


tå Crested Auklet C - Red Faced Cormorant 


[Goyal et. al., 2019] 108 


Counterfactual Explanations 


As ML models increasingly deployed to make high-stakes decisions 
(e.g., loan applications), it becomes important to provide recourse to 
affected individuals. 


Counterfactual Explanations 
What features need to be changed and by 


how much to flip a model's prediction ? 
(i.e., to reverse an unfavorable outcome). 


Counterfactual Explanations 


Predictive 


Applicant 
Model pp 


Loan Application 


Deny Loan 


Recourse: Increase your salary by 5K & pay your credit card bills on time for next 3 months 
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Generating Counterfactual Explanations: 
Intuition 


Decision boundary 


Proposed solutions differ on: 


1. How to choose among 
candidate counterfactuals? 


1. How much access is needed to 
the underlying predictive model? 
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[ Wachter et. al., 2018 ] 


Take 1: Minimum Distance Counterfactuals 


Distance Metric 


- Counterfactual 
: / 
arg min d(x, x’) 
a’ Original Instance 
IN 1 
sd. Ta’) = E. 
Predictive Model Desired Outcome 


Choice of distance metric dictates what kinds of counterfactuals are chosen. 


Wachter et. al. use normalized Manhattan distance. 


[ Wachter et. al., 2018] 


Take 1: Minimum Distance Counterfactuals 


arg min d(x, x’) 


si. fa jaa 


Wachter et. al. solve a differentiable, unconstrained version of the objective 
using ADAM optimization algorithm with random restarts. 


This method requires access to gradients of the underlying predictive model. 


Take 1: Minimum Distance Counterfactuals 


Person 1: If your LSAT was 34.0, you would have 
an average predicted score (0). 


Person 2: If your LSAT was 32.4, you would have 
an average predicted score (0). 


Person 3: If your LSAT was 33.5, and you were 
you would have an average predicted score 
(0). 


Person 4: If your LSAT was 35.8, and you were 
‘white’, you would have an average predicted score 


(0). 


Not feasible to act upon these features! 


Person 5: If your LSAT was 34.9, you would have 
an average predicted score (0). 
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[ Ustun et. al., 2019] 


Take 2: Feasible and Least Cost Counterfactuals 
arg min d(x, x’) arg pin Gost t, x’) 
r’ — r'c 
st. fle =a sh f(a )=y' 


- 4 is the set of feasible counterfactuals (input by end user) 
- E.g., changes to race, gender are not feasible 


- Cost is modeled as total log-percentile shift 
- Changes become harder when starting off from a higher percentile value 
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[ Ustun et. al., 2019] 


Take 2: Feasible and Least Cost Counterfactuals 


arg min d(x, x’) 


arg min cost(z, x’) 
Em c 
så. f(x Jeu ak din eg 


* Ustun et. al. only consider the case where the model is a linear classifier 
* Objective formulated as an IP and optimized using CPLEX 


* Requires complete access to the linear classifier i.e., weight vector 


[ 


Take 2: Feasible and Least Cost Counterfactuals 


arg min d(x, x’) arg min cost(z, x’) 
gk" » TEA 
st. f(x)=y sh Jim jeg 


Question: What if we have a black box or a non-linear classifier? 


Answer: generate a local linear model approximation (e.g., using LIME) 
and then apply Ustun et. al.s framework 


[ Ustun et. al., 2019] 


Take 2: Feasible and Least Cost Counterfactuals 


FEATURES TO CHANGE CURRENT VALUES REQUIRED VALUES 
n credit cards 5 — 3 
has savings account FALSE TRUE 


— 
has retirement account FALSE — TRUE 


[ Mahajan et. al., 2019, Karimi et. al. 2020 ] 


Take 3: Causally Feasible Counterfactuals 


Loan Applicant Loan Applicant Predictive Model 


After 1 year 


f(x) 


Important to account for feature interactions when generating counterfactuals! 


But how?! 
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[ Ustun et. al., 2019] 


Take 3: Causally Feasible Counterfactuals 


/ 
arg min cost(x, x’) 
z'c.A 


sk fy =g 


A Is the set of causally feasible counterfactuals permitted according to a given 
suructural Causal Model (SCM). 


[ Verma et. al., 2020, Pawelczyk et. al., 2020] 


Counterfactuals on Data Manifold 


* Generated counterfactuals should lie on the data manifold 


* Construct Variational Autoencoders (VAEs) to map input instances to latent 
space 


e Search for counterfactuals in the latent space 


e Once a counterfactual is found, map it back to the input space using the 
decoder 


Approaches for Post hoc Explainability 


Local Explanations Globa 


: Feature Importances - Collection of Local Explanations 


- Rule Based - Representation Based 
- Saliency Maps - Model Distillation 
- Prototypes/Example Based - Summaries of Counterfactuals 


- Counterfactuals 


Global Explanations 


e Explain the complete behavior of a given (black box) model 
o Provide a bird's eye view of model behavior 


e Help detect big picture model biases persistent across larger subgroups 
of the population 


o Impractical to manually inspect local explanations of several instances to 
ascertain big picture biases! 


e Global explanations are complementary to local explanations 


Local vs. Global Explanations 


Explain complete behavior of the model 


Help shed light on big picture biases 
affecting larger subgroups 


Help vet if the model, at a high level, is 
suitable for deployment 
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Approaches for Post hoc Explainability 


Local Explanations Global Explanations 


- Feature Importances Collection o 


- Rule Based - Representation Based 


- Saliency Maps - Model Distillation 
- Prototypes/Example Based - Summaries of Counterfactuals 


- Counterfactuals 


Global Explanation as a Collection of Local Explanations 
How to generate a global explanation of a (black box) model? 


- Generate a local explanation for every instance in the data using 
one of the approaches discussed earlier 


- Pick a subset of k local explanations to constitute the global 
explanation 


[F 


Global Explanations from Local Feature Importances: SP-LIME 


LIME explains a single prediction 
local behavior for a single instance 


Can't examine all explanations 
Instead pick k explanations to show to the user 


Representative Diverse 
Should summarize the Should not be redundant in 
model's global behavior their descriptions 


Single explanation 


SP-LIME uses submodular optimization 
and greedily picks k explanations 
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[ Ribeiro et al. 2018 ] 


Global Explanations from Local Rule Sets: SP-Anchor 


* Use Anchors algorithm discussed earlier to obtain local rule sets 
for every instance in the data 


* Usethe same procedure to greedily select a subset of k local rule 
sets to correspond to the global explanation 
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Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
: Feature Importances - Collection of Local Explanations 
- Rule Based epresentation Base 


- Saliency Maps - Model Distillation 
- Prototypes/Example Based - Summaries of Counterfactuals 
- Counterfactuals 


Representation Based Approaches 
* Derive model understanding by analyzing intermediate representations of a DNN. 


* Determine model's reliance on 'concepts' that are semantically meaningful to 
humans. 


Representation Based Explanations 


— Zebra 
(0.97) 


f 


9) 
Mars 
NI 


How important is the notion of "stripes" for this prediction? 


[Kim et. al., 2018] 131 
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Representation Based Explanations: TCAV 


Examples of the concept "stripes" fi: R° > R” hi :R” >R 


ymm €— = n 47 `~ 


mil: == B T hen 
LiGBBDO 


Random examples 


Train a linear classifier to separate J l (2 g f l @) fi P) Vi l (Ge) 


activations 7 
i o NE "b 7 
The vector orthogonal to the decision boundary pointing fi (=) a) 
towards the "stripes" class quantifies the concept "stripes" a5 V C f, ( 2) 
T N | 
Compute derivatives by leveraging this vector to determine f | t) f Ml f l (88) 


the importance of the notion of stripes for any given 
prediction 


Quantitative Testing with Concept Activation Vectors (TCAV) 


TCAV measures the sensitivity of a model's prediction to user provided 
concept using the model internal representations. 


(a) (c) fi IR” ES Ig?" hik . Ig?" =" R 


P a im A r 


Il] HL IE At — Bl K" class 
too2280 " 
KE $2) M (d) " u BSA) Ë) pp) © 
f JE å le Sc. (PG 
| '" vc. SB) i 
1) ml ` FE -Vhi (fi(P)) ‘Vo 
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Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
: Feature Importances - Collection of Local Explanations 
- Rule Based - Representation Based 
- Saliency Maps 
- Prototypes/Example Based - Summaries of Counterfactuals 
- Counterfactuals 


Model Distillation for Generating Global Explanations 


Simpler, interpretable model 


which is optimized to mimic 
the model predictions 


Predictive 
Model 


Model 
Predictions 


[Tan et. al., 2019] 


Generalized Additive Models as Global Explanations 


Black Box 
Model 


Model 
Predictions 
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[Tan et. al., 2019] 


Generalized Additive Models as Global Explanations: 
Shape Functions for Predicting Bike Demand 


Demand 


| 1.0 + 
i 
0.84 
| N 
0.6 4 } 
f 
vo 0.44 f 
£e i 
å f 
5 0.24 f 
à V 
\ J 
0.04 V 
-0.2 
-0.4 
—— —— - ————————— - - d d 
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 10 25 35 40 
H Temperature (Celsius) 
0.02 4 
0.3 
0014 
0.2 
0.004 
0.1 
0.014 
0.0 
-0.02 4 
-0.1 
0.03 | $2 
| 
0.04 1 | -0.3 
- - - 
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[Tan et. al., 2019] 


Generalized Additive Models as Global Explanations: 
Shape Functions for Predicting Bike Demand 


How does bike demand vary as a function of temperature? 


15 20 25 
Temperature (Celsius) 


[Tan et. al., 2019] 


Generalized Additive Models as Global Explanations 


Generalized Additive Model (GAM) : 


y - ho + 2 Mule) + h;;( Hud) +X D hazel Tj,2j,Zk) +: 


izj izj JFR 
Shape functions of Higher order feature 
individual features interaction terms 


Fit this model to the predictions of the black box to obtain the shape functions. 


[ Bastani et. al., 2019 | 


Decision Trees as Global Explanations 


Age > 50 


no Ea 
High cholesterol 
no Sø 
Edema Pre-operative medical exam (no findings) 
m m " i lc. 


Hypothyroidism medication (levothyroxine) High risk Dermatophytosis of nail High triglycerides medication (lovaza) 


FA Ne no ra yes no Ka 
Chronic lower back pain High risk Abdominal pain L isk | High risk 


La be | 1 Impotence medication (cialis) Smoker Red blood cells in urine 


Black Box Label 1 "ad pa =o | yes 
M O d el Arthritis medication (celecoxib) 


Shoulder disorders High risk 


NS 


Routine medical exam (no findings) 


Label 2 High risk 


Model 
Predictions 
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[| Lakkaraju et. al., 2019 ] 


Customizable Decision Sets as Global Explanations 


If Age <50 and Male — Yes: 


If Past-Depression — Yes and Insomnia —No and Melancholy —No, then Healthy 


If Age > 50 and Male =No: 


If Family-Depression = Yes and Insomnia =No and Melancholy =Yes and Tiredness = Yes, then Depression 


If Past-Depression —Yes and Insomnia —Yes and Melancholy — Yes and Tiredness —Yes, then Depression 
——- —> 


If Family-Depression =No and Insomnia =No and Melancholy =No and Tiredness =No, then Healthy 


Label 1 


Default: 


Black Box 
Model 


If Past-Depression =Yes and Tiredness =No and Exercise =No and Insomnia =Yes, then Depression 


If Past-Depression =No and Rapid-Weight-Gain =Yes and Tiredness — Yes and Melancholy =Yes, then Depression 


Label 2 


Model 
Predictions 
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[| Lakkaraju et. al., 2019 ] 
Customizable Decision Sets as Global Explanations 


Subgroup Descriptor 


If Age <50 and Male — Yes: "adi 


If Past-Depression = Yes and Insoyhnia =No and Melancholy =No, then Healthy 


If Past-Depression = Yes and Insomnia = Yes and Melancholy — Yes and Tiredness — Yes, then Depression } x 


If Age > 50 and Male =No: 


If Family-Depression = Yes and Insomnia =No and Melancholy = Yes and Tiredness = Yes, then Depression 


Decision Logic 


If Family-Depression =No and Insomnia =No and Melancholy =No and Tiredness =No, then Healthy 


Default: 
If Past-Depression = Yes and Tiredness =No and Exercise =No and Insomnia = Yes, then Depression 


If Past-Depression =No and Rapid-Weight-Gain = Yes and Tiredness = Yes and Melancholy —Yes, then Depression 
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[| Lakkaraju et. al., 2019 ] 


Customizable Decision Sets as Global Explanations 


If Exercise = Yes and Smoking =No: 


If Rapid-Weight-Gain = Yes and Tiredness = Yes and Melancholy = Yes and Insomnia = Yes and Age <50, then Depression 
If Tiredness = Yes and Melancholy — Yes and Age > 50, then Depression 


If Tiredness =No and Melancholy =No, then Healthy 


If Smoking — Yes: 


If Rapid-Weight-Gain = Yes and Melancholy = Yes, then Depression 
If Tiredness =No and Insomnia =No and Melancholy =No and Rapid-Weight-Gain =No, then Healthy 


If Insomnia = Yes and Past-Depression = Yes and Tiredness = Yes, then Depression 


Default: 
If Past-Depression — Yes and Tiredness — Yes and Melancholy — Yes, then Depression 


If Past-Depression =No and Rapid-Weight-Gain = Yes and Tiredness =No and Melancholy —Yes, then Depression 


If Family-Depression = Yes and Age > 50 and Male =No and Tiredness = Yes, then Depression 
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Customizable Decision Sets as Global Explanations: 
Desiderata & Optimization Problem 


Fidelity 
Describe model behavior accurately 


Unambiguity 
No contradicting explanations 


Simplicity 
Users should be able to look at the explanation 
and reason about model behavior 


Customizability 
Users should be able to understand model 
behavior across various subgroups of 
interest 


Fidelity 
Minimize number of instances for which 
explanation's label 4 model prediction 


Unambiguity 
Minimize the number of duplicate rules 
applicable to each instance 


Simplicity 
Minimize the number of conditions in rules; 
Constraints on number of rules & subgroups; 


Customizability 
Outer rules should only comprise of features 
of user interest (candidate set restricted) 


[| Lakkaraju et. al., 2019 ] 


Customizable Decision Sets as Global Explanations 


e The complete optimization problem is non-negative, non-normal, 
non-monotone, and submodular with matroid constraints 


e Solved using the well-known smooth local search algorithm (Feige 
et. al., 2007) with best known optimality guarantees. 


Approaches for Post hoc Explainability 


Local Explanations Global Explanations 
- Feature Importances - Collection of Local Explanations 
- Rule Based - Representation Based 


- Saliency Maps - Model Distillation 
- Prototypes/Example Based Summaries of Counterfac 


- Counterfactuals 


[ Rawal and Lakkaraju, 2020 ] 


Counterfactual Explanations 


Predictive 
Model 


f(x) RECOURSES Decision Maker 
(or) Regulatory Authority 


How do recourses permitted by the model vary 
across various racial & gender subgroups? 
Are there any biases against certain 
demographics? 
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[ Rawal and Lakkaraju, 2020 ] 


Customizable Global Summaries of Counterfactuals 


P redictive If Race — Caucasian and Gender — Male: 
M odel If Married =No and Property =No and Has Job =No, then Married =No and Property =No and Has Job — Yes 
If Drugs = Yes and School =No and Pays Rent =No , then Drugs =No and School =No and Pays Rent =No 
If Race — Caucasian and Gender — Female: 
If Married =No and Property =No and Has Job =No, then Married =No and Property =No and Has Job — Yes 
f(x) 3 D ENIED If Drugs = Yes and School =No and Pays Rent =No , then Drugs =No and School =No and Pays Rent = Yes 
If Race # Caucasian: 
If Married =No and Property =No and Has Job =No, then Married =No and Property = Yes and Has Job =Yes 
If Drugs = Yes and School =No and Pays Rent =No , then Drugs =No and School —Yes and Pays Rent =Yes 


How do recourses permitted by the model vary 
across various racial & gender subgroups? 
Are there any biases against certain 
demographics? 
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[ Rawal and Lakkaraju, 2020 ] 


Customizable Global Summaries of Counterfactuals 


Omg! this model is biased. It requires 
certain demographics to "act upon" lot 


Subgroup Descriptor 


more features than others. 


If Race — Caucasian and Gender — Male: 


If Married =No and Property =No and Has Job/=No, then Married =No and Property =No and Has Job — Yes 


If Drugs = Yes and School =No and Pays Rent =No , then Drugs =No and School =No and Pays Rent =No 


If Race — Caucasian and Gender — Female: 


If Married =No and Property =No and Has Job =No, then Married =No and Property =No and Has Job — Yes 


If Drugs = Yes and School =No and Pays Rent =No , then Drugs =No and School =No and Pays Rent = Yes 


If Race Æ Caucasian: Recourse Rules 


If Drugs — Yes and School =No and Pays Rent —No , then Drugs =No and School — Yes and Pays Rent = Yes 


If Married =No and Property =No and Has Job =No, then Married =No and Property = Yes and Has Job = Yes } Fa 
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Customizable Global Summaries of Counterfactuals: 
Desiderata & Optimization Problem 


Recourse Correctness 
Prescribed recourses should obtain desirable outcomes 


Recourse Coverage 
(Almost all) applicants should be provided with recourses 


Minimal Recourse Costs 
Acting upon a prescribed recourse 
should not be impractical or terribly expensive 


Interpretability of Summaries 
Summaries should be readily understandable to 


stakeholders (e.g., decision makers/regulatory authorities). 


Customizability 
Stakeholders should be able to understand model behavior 
across various subgroups of interest 


Recourse Correctness 
Minimize number of applicants for whom prescribed recourse 
does not lead to desired outcome 


Recourse Coverage 
Minimize number of applicants for whom recourse does not exist 
(i.e., satisfy no rule). 


Minimal Recourse Costs 
Minimize total feature costs as well as magnitude of changes 
in feature values 


Interpretability of Summaries 
Constraints on # of rules, # of conditions in rules & # of subgroups 


Customizability 
Outer rules should only comprise of features of stakeholder interest 
(candidate set restricted) 


[ Rawal and Lakkaraju, 2020 ] 


Customizable Global Summaries of Counterfactuals 
e The complete optimization problem is non-negative, non-normal, 
non-monotone, and submodular with matroid constraints 


e Solved using the well-known smooth local search algorithm (Feige 
et. al., 2007) with best known optimality guarantees. 


Breakout Groups 


« What concepts/ideas/approaches from our morning discussion stood out to you ? 


e We discussed different basic units of interpretation -- prototypes, rules, risk scores, 
shape functions (GAMs), feature importances 
* Are some of these more suited to certain data modalities (e.g., tabular, images, 
text) than others? 


e What could be some potential vulnerabilities/drawbacks of inherently 
interpretable models and post hoc explanation methods? 


* Given the diversity of the methods we discussed, how do we go about evaluating 
inherently interpretable models and post hoc explanation methods? 


Agenda 

Inherently Interpretable Models 

Post hoc Explanation Methods 

Evaluating Model Interpretations/Explanations 

Empirically & Theoretically Analyzing Interpretations/Explanations 


Future of Model Understanding 


Evaluating Model 
Interpretations/Explanations 


- Evaluating the meaningfulness or correctness of explanations 


- Diverse ways of doing this depending on the type of model 
interpretation/explanation 


- Evaluating the interpretability of explanations 


[ Doshi-Velez and Kim, 2017 ] 


Evaluating Interpretability 


|| Real | 
| Humans | | 


| Application-grounded Evaluation 


More — — —— P — - 
Specific Human-grounded Evaluation 
d Humans | | 
an — | | | 


Costly 


| 5 mM E No Real _ Proxy 
Functionally-grounded Evaluation 


Evaluating Interpretability 


- Functionally-grounded evaluation: Quantitative metrics - e.g., 
number of rules, prototypes --» lower is better! 


- Human-grounded evaluation: binary forced choice, forward 
simulation/prediction, counterfactual simulation 


- Application-grounded evaluation: Domain expert with exact 
application task or simpler/partial task 


Evaluating Inherently Interpretable Models 


- Evaluating the accuracy of the resulting model 
- Evaluating the interpretability of the resulting model 


- Do we need to evaluate the "correctness" or “meaningfulness” of 
the resulting interpretations? 


[Letham et. al. 2016] 


Evaluating Bayesian Rule Lists 


- Arule list classifier for stroke prediction 


if hemiplegia and age > 60 then stroke risk 58.9% (53.8%-63.8%) 

else if cerebrovascular disorder then stroke risk 47.8% (44.8%-50.7%) 
else if transient ischaemic attack then stroke risk 23.896 (19.596—28.490) 
else if occlusion and stenosis of carotid artery without infarction then 
stroke risk 15.8% (12.2%-19.6%) 

else if altered state of consciousness and age > 60 then stroke risk 


16.0% (12.2%-—20.2%) 
else if age < 70 then stroke risk 4.6% (3.996—5.496) 
else stroke risk 8.7% (7.9%-9.6%) 


[Lakkaraju et. al. 2016] 


Evaluating Interpretable Decision Sets 


- A decision set classifier for disease diagnosis 


If Respiratory-IlIness— Yes and Smoker= Yes and Age > 50 then Lung Cancer 


If Risk-LungCancer— Yes and Blood-Pressure > 0.3 then Lung Cancer 

If Risk-Depression— Yes and Past-Depression— Yes then Depression 

If BMI- 0.3 and Insurance=None and Blood-Pressure > 0.2 then Depression 
If Smoker— Yes and BMI > 0.2 and Age- 60 then Diabetes 

If Risk-Diabetes— Yes and BMI > 0.4 and Prob-Infections- 0.2 then Diabetes 


If Doctor-Visits > 0.4 and Childhood-Obesity— Yes then Diabetes 


Evaluating Interpretability of Bayesian 
Rule Lists and Interpretable Decision Sets 


- Number of rules, predicates etc. || lower is better! 


- User studies to compare interpretable decision sets to Bayesian 
Decision Lists (Letham et. al.) 


- Each user is randomly assigned one of the two models 


- 10 objective and 2 descriptive questions per user 


Interface for Objective Questions 


Yes/No Question 


In this question, you will see a set of rules which characterize various diseases. These rules have been generated by a machine leaming 
model to explain the properties of patients suffering from the corresponding diseases. Please take a look at the rules and answer the question 
below. 


Rules generated by a machine learning model "M1" 


If Allergies = True and Smoking = True and Irregular-Heartbeat-Symptoms = True, then Asthma 

M Allergies = True and Past-Respiratory-lilness = True and High-Body-Temperature = True, then Asthma 

If Smoking = True and Overweight = True and Age >= 60, then Diabetes 

If Family-History-Diabetes = True and Overweight = True and Has-Frequent-Infections = True, then Diabetes 

If Frequently-Visited-Doctor = True and Childhood-Obesity = True and Past-Respiratory-lliness = True, then Diabetes 
If Family-History-Depression = True and Past-Depression-Issues = True and Gender = Female, then Depression 

If Overweight = True and Insurance-Coverage = False and High-Blood-Pressure = True, then Depression 

M Past-Respiratory-lliness = True and Age >= 50 and Smoking = True, then Lung Cancer 


If Family-History-LungCancer = True and Allergies = True and High-Blood-Pressure = True, then Lung Cancer 


Question: 
There is a patient with the following medical record: 

1 PastRespiratory-lliness = True 

2. Smoking = True 
We do not have any other information about this patient Please do not make any assumptions about the values of other fields. 
According to the rules given by model *M1* above, can we be absolutely sure that this patient suffers from Lung Cancer? 
Your Answer: 


Yes 


No 
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Interface for Descriptive Questions 


Descriptive Question 


In this question, you will see a set of rules which characterize various diseases. These rules have been generated by a machine leaming 
model to explain the properties of patients suffering from the corresponding diseases. Please take a look at the rules and answer the question 
below. 


Here, you will be asked to write a paragraph describing the properties of patients with a specific disease based on the given rules. Below we 
provide an example which can help you understand how to write a short description given a rule 


Example: 
Rule: If Overweight 2 False and Smoking = False, then Healthy 


Description: People who do not smoke and do not have any weight problems are healthy. 


Rules generated by a machine learning model "M1" 


M Allergies = True and Smoking = True and Irregular-Heartbeat-Symptoms = True, then Asthma 

If Allergies = True and Past-Respiratory-lliness = True and High-Body-Temperature = True, then Asthma 

M Smoking 7 True and Overweight = True and Age >= 60, then Diabetes 

M Family-History-Diabetes = True and Overweight = True and Has-Frequent-Infections = True, then Diabetes 

M Frequently-Visited-Doctor = True and Childhood-Obesity = True and Past-Respiratory-lliness = True, then Diabetes 
If Family-History-Depression = True and Past-Depression-Issues = True and Gender = Female, then Depression 

If Overweight = True and Insurance-Coverage = False and High-Blood-Pressure = True, then Depression 

M Past-Respiratory-lliness = True and Age >= 50 and Smoking = True, then Lung Cancer 


M Family-History-LungCancer = True and Allergies = True and High-Blood-Pressure = True, then Lung Cancer 


Question: 


Please write a short paragraph describing the characteristics of Asthma patients based on the rules provided above 
Please use plain english language to write your description. Feel free to use multiple sentences to explain a single rule. 


Your Answer: 
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User Study Results 


Our Bayesian 
Approach Decision 
Lists 
Descriptive Human 0.17 
Accuracy 


Avg. Time 113.4 396.86 
Spent (secs.) 
Avg. # of 31.11 120.57 
Words 
Objective Human 0.97 0.82 
Accuracy 


Avg. Time 28.18 36.34 
Spent (secs. ) 


Objective Questions: 17% more accurate, 22% faster; 
Descriptive Questions: 74% fewer words, 71% faster. 


[Jain and Wallace, 2019] 


Evaluating Prototype and Attention Layers 


- Are prototypes and attention weights always meaningful? 


- Do attention weights correlate with other measures of feature 
importance? E.g., gradients \\ 
os * 


- Would alternative attention weights yield different predictions? 


[Agarwal et. al., 2022] 


Evaluating Post hoc Explanations 


- Evaluating the faithfulness (or correctness) of post hoc 
explanations 


- Evaluating the stability of post hoc explanations 
- Evaluating the fairness of post hoc explanations 


- Evaluating the interpretability of post hoc explanations 


Evaluating Faithfulness of Post hoc 
Explanations - Ground Truth 


t t Ea,k) nt t Ey, k 
FeatureAgreement(E,, Ep, k) = LEM 


Rank Agreement(E,, Ep, k) 
| U {s | se top_ features(Ea,k) ^setop features(Ey, k) ^ rank(Eq, s) = rank(Es, s)}| 
ses 
k 


SignAgreement(E,, Ep, k) 
| U {s | se top_features(Ea, k) ^ setop features(Es, k) ^ sign(Ea, s) = sign( E», s)}| 
seS 
k 


SignedRankAgreement(E, , Ep, k) 


| U {s | se top_ features(Ea, k) ^ s E€ top_features(Ep, k) 
seS 
^ sign(Ea, s) = sign(Ey,s) ^ rank(Ea, s) = rank(Es, s)}| 
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Evaluating Faithfulness of Post hoc 
Explanations - Ground Truth 


Spearman rank correlation coefficient computed over features of interest 


RankCorrelation(E,, Ey, F) = rs(Ranking(Ea, F), Ranking(Ey, F)) 


X  l[RelativeRanking(E,, fi, fi) = RelativeRanking( Es, fi, f;)] 


PairwiseRankAgreement(E,, Ey, F) = ai (EN 
2 
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Evaluating Faithfulness of Post hoc 
Explanations - Explanations as Models 


- If the explanation is itself a model (e.g., linear model fit by LIME), 
we can compute the fraction of instances for which the labels 
assigned by explanation model match those assigned by the 
underlying model 


Evaluating Faithfulness of Post hoc 
Explanations 


- What if we do not have any ground truth? 


- What if explanations cannot be considered as models that output 
predictions? 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


e Deletion: remove important features and see what happens.. 


Prediction Probability 


% of Pixels deleted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


e Deletion: remove important features and see what happens.. 


Prediction Probability 


% of Pixels deleted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


e Deletion: remove important features and see what happens.. 


Prediction Probability 


% of Pixels deleted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


e Deletion: remove important features and see what happens.. 


Prediction Probability 


% of Pixels deleted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


e Deletion: remove important features and see what happens.. 


Prediction Probability 


% of Pixels deleted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


* Deletion: remove important features and see what happens.. 


Prediction Probability 


96 of Pixels deleted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


* [nsertion: add important features and see what happens.. 


Prediction Probability 


96 of Pixels inserted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


* [nsertion: add important features and see what happens.. 


Prediction Probability 


96 of Pixels inserted 
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How important are selected features? 


* [nsertion: add important features and see what happens.. 


Prediction Probability 


96 of Pixels inserted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


* [nsertion: add important features and see what happens.. 


Prediction Probability 


96 of Pixels inserted 


[ Qi, Khorram, Fuxin, 2020 ] 


How important are selected features? 


* [nsertion: add important features and see what happens.. 


Prediction Probability 


96 of Pixels inserted 


[Alvarez-Melis, 2018; Agarwal et. al., 2022] 


Evaluating Stability of Post hoc 
Explanations 


- Are post hoc explanations unstable w.rt. small input 
perturbations? 


Local Lipschitz Constant 
Post hoc Explanation 
4 
L(z;) ES | f (xi) 7 f(z;)lle 
4 = €B(a:) — di — vjll2 
Input 


Evaluating Stability of Post hoc 
Explanations 


- What if the underlying model itself is unstable? 


- Relative Output Stability: Denominator accounts for changes in 
the prediction probabilities 


- Relative Representation Stability: Denominator accounts for 
changes in the intermediate representations of the underlying 
model 


[Dai et. al., 2022] 


Evaluating Fairness of Post hoc 
Explanations 


- Compute mean faithfulness/stability metrics for instances from 
majority and minority groups (e.g., race Å vs. race B, male vs. 
female) 


- Ifthe difference between the two means is statistically 
significant, then there is unfairness in the post hoc explanations 


- Why/when can such unfairness occur? 


Evaluating Interpretability of Post hoc 
Explanations 


| Application-grounded Evaluation $; | | 
Real Simple 
| Humans | | Tasks 
No Real | Proxy 
Humans Tasks 


More - — — Ó— : 
and — — | 
Costly 


Functionally-grounded Evaluation 


[ Ribeiro et al. 2018, Hase and Bansal 2020 ] 


Predicting Behavior (Simulation ) 


m) 


Explanations 
m aid 


pn di X= 
User guesses what 
Data the classifier would do 


| on new data 


Show to user 


Predictions & 


Classifier 
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[ Poursabzi-Sangdeh et al. 2018 ] 


Predicting Behavior (Simulation ) 


What do you think the model will predict? 


0 01 02 03 04 05 06 0 S 1 
$800,000 


How confident are you the model will predict this? 
1 2 3 
It's likely the model 


will predict [s] 
something else 


I'm confident the 
model will predict 
this 


(a) Step 1: Participants were asked to guess the model's prediction and state their confidence. 
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[ Lai and Tan, 2019 ] 


Human-AI Collaboration 


* Are Explanations Useful for Making Decisions? 
* For tasks where the algorithms are not reliable by themselves 


pz: 


" 


Full human agency predicted labels labels and suggesting high accuracy Full automation 
(Decision making with no assistance) 


Showing machine Showing machine predicted 


187 


[ Lai and Tan, 2019 ] 


Human-AI Collaboration 


* Deception Detection: Identify fake reviews online 
* Are Humans better detectors with explanations? 


Note: The highlighted words are important words which machine learning classifiers use to decide if a review 
is genuine or deceptive. The below scale shows level of importance of each word. 


| | lL ED 


Least Important Most Important 


I would not stay at this hotel again. The rooms had a fowl odor. It Seemed as though the carpets have never been 
cleaned. The neighborhood was also less than desirable. The housekeepers Seemed to be snooping HØNA while 
they were cleaning the rooms. I will say that the front desk staff was friendly albeit slightly dimwitted. 
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Can we improve the accuracy of decisions 
using feature attribution-based explanations? 


- Prediction Problem: Is a given patient likely to be diagnosed with 
breast cancer within 2 years? 


- User studies carried out with about 78 doctors (Residents, 
Internal Medicine) 


- Each doctor looks at 10 patient records from historical data and 
makes predictions for each of them. 
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GG 


Accuracy: 78.32% 


Can we improve the accuracy of decisions 
using feature attribution-based explanations? 


GG 


+ 


© 
OS 


+ 


At Risk (0.91) 


Accuracy: 82.02% 


| 
Model Accuracy: 88.92% 


At Risk (0.91) 


Accuracy: 93.11% 
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GG 


Accuracy: 78.32% 


Can we improve the accuracy of decisions 
using feature attribution-based explanations? 


GG 


+ 


G6 


+ 


At Risk (0.91) At Risk (0.91) 


Accuracy: 93.11% 


l 
l 
l 
l 
| 
l 
l 
l 
l 
l 
l 
l 
| 
l 
l 
l 
l 
l 
| 
l 
l 
Accuracy: 82.02% | 
| 
l 


| 
Model Accuracy: 88.9296 191 


Challenges of Evaluating Interpretable 
Models/Post hoc Explanation Methods 


e Evaluating interpretations/explanations still an ongoing endeavor 


€ Parameter settings heavily influence the resulting 
interpretations/explanations 


e Diversity of explanation/interpretation methods || diverse metrics 


€ User studies are not consistent 
o Affected by choice of: UI, phrasing, visualization, population, incentives, ... 


e All the above leading to conflicting findings ap 


Open Source Tools for Quantitative 
Evaluation 


e Interpretable models: https://github.com/interpretml/interpret 


e Post hoc explanation methods: OpenXAI: https://open-xai.github.io/ -- 22 
metrics (faithfulness, stability, fairness); public dashboards comparing 
various metrics on different metrics; 11 lines of code to evaluate 


explanation quality 


e Other XAI libraries: Captum, quantus, shap bench, ERASER (NLP) 


Agenda 

Inherently Interpretable Models 

Post hoc Explanation Methods 

Evaluating Model Interpretations/Explanations 

Empirically & Theoretically Analyzing Interpretations/Explanations 


Future of Model Understanding 


Empirically Analyzing 
Interpretations/Explanations 


*Lot of recent focus on analyzing the behavior of post hoc explanation methods. 


* Empirical studies analyzing the faithfulness, stability, fairness, adversarial 
vulnerabilities, and utility of post hoc explanation methods. 


eSeveral studies demonstrate limitations of existing post hoc methods. 


Limitations: Faithfulness 


Model parameter randomization test 
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Limitations: Faithfulness 


Model parameter randomization test 


Cascading randomization 
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Limitations: Faithfulness 


Model parameter randomization test 


Cascading randomization 
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Limitations: Faithfulness 


Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurlPS, 2018. 199 


Limitations: Stability 


Are post-hoc explanations unstable wrt small 


non-adversarial input perturbation? 


Local Lipschitz Constant 


Explanation function: LIME, input model 
SHAP, Gradient...etc. . : 
Elz) = argmax VED FEN» A(z, f, H) 
4 Ps GB. ou) ||; v ti || : 
Input i 
hyperparameters 


Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018. 200 


Limitations: Stability 


Are post-hoc explanations unstable wrt small 


non-adversarial input perturbation? 


e Perturbation approaches like LIME can 
be unstable. 


Lipshitz Estimate 
E 


me RÀ 


| 
| 
| 


0 
Saliency Grad*Input Int.Grad. e-LRP Occlusion LIME 
Method 


Estimate for 100 tests for an MNIST Model. 


Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018. 201 


[Slack et. al., 2020] 


Limitations: Stability - Problem is Worse! 


When you repeatedly run LIME on the same instance, you get different explanations (blue region) 


Post-hoc Explanations are Fragile 


Post-hoc explanations can be easily manipulated. 


Original Image 
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Post-hoc Explanations are Fragile 


Post-hoc explanations can be easily manipulated. 


Original Image 
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Post-hoc Explanations are Fragile 


Post-hoc explanations can be easily manipulated. 


Original Image Manipulated Image 
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Post-hoc Explanations are Fragile 


Post-hoc explanations can be easily manipulated. 


Original Image Manipulated Image 
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Adversarial Attacks on Explanations 


Minimally modify the input with a small perturbation without 
changing the model prediction. 


arg max D (I(a; N), I (ær +8; N)) 
ô 
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Adversarial Attacks on Explanations 


Minimally modify the input with a small perturbation without 
changing the model prediction. 


arg max D (I(a; N), I (ær +8; N)) 
ô 


subject to: ||d||o. < e, 
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Adversarial Attacks on Explanations 


Minimally modify the input with a small perturbation without 
changing the model prediction. 


arg max D (I(a; N), I (ær +8; N)) 
ô 


Prediction(x; + 0; N) = Prediction (æ+; M) 


subject to: ||Ó||;5 < e, 
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Adversarial Classifiers to fool LIME & SHAP 


Scaffolding attack used to hide classifier dependence on gender. 


Biased Classifier f With LIME Attack With SHAP Attack 


Feature Importance Rank 


0 20 40 60 80 100 0 20 40 80 100 0 20 40 80 100 


ENN Gender Hm Loan Rate % Income m= All Others 
% Occurrence 


Slack and Hilgard et. al. 2020 210 


Vulnerabilities of LIME/SHAP: Intuition 


Perturbation 
» Original COMPAS 


PCA 2 


Vulnerabilities of LIME/SHAP: Intuition 


Perturbation 
^ Original COMPAS 


PCA 2 


Adversaries can exploit this and build a classifier that is biased 


on in-sample data points and unbiased on OOD samples! 


Building Adversarial Classifiers 
Setting: 


- Adversary wants to deploy a biased classifier f in real 
world. 


* E.g., uses only race to make decisions 


- Adversary must provide black box access to customers 
and regulators who may use post hoc techniques 
(GDPR). 


- Goal of adversary is to fool post hoc explanation 
techniques and hide underlying biases of f 
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Building Adversarial Classifiers 


- Input: Adversary provides us with the biased classifier f, an 
input dataset X sampled from real world input distribution X, . 


- Output: Scaffolded classifier e which behaves exactly like f when 
making predictions on instances sampled from X, but will not 
reveal underlying biases of f when probed with 
perturbation-based post hoc explanation techniques. 

: eisthe adversarial classifier 


Building Adversarial Classifiers 
- Adversarial classifier e can be defined as: 
u f(x), if £ € Xaist 
eu pii otherwise 


- fis the biased classifier input by adversary. 


- ,, is the unbiased classifier (e.g., only uses features uncorrelated 
to sensitive attributes) 


Limitations: Stability 


Post-hoc explanations can be unstable to small, non-adversarial, 
perturbations to the input. 
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Limitations: Stability 


Post-hoc explanations can be unstable to small, non-adversarial, 
perturbations to the input. 


‘Local Lipschitz Constant’ 


Explanation function: LIME, SHAP, 
Gradient...etc. 


> (Fæ) — f(;)llo 


L(x;) = argmax 
^ £5 C Bed) æi Lj |l 
Input 
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Limitations: Stability 


e Perturbation approaches like LIME " 
can be unstable. == 


Lipshitz Estimate 


me menn 
7 — === 


0 
Saliency Grad*Input Int.Grad. e-LRP Occlusion LIME 
Method 


Estimate for 100 tests for an MNIST Model. 


Alvarez et. al. 2018. 
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Sensitivity to Hyperparameters 


Input image UU. maps (i.e. "| 


LIME [42] 


Random seed: 


Explanations can be highly 
sensitive to hyperparameters 
such as random seed, number 
of perturbations, patch size, etc. 


SP [60] 


Patch size: 5x5 — 29x29 53x53 


Blur radius: 


MP [22] 


SG [48] 


Sample size: 50 200 800 


Bansal, Agarwal, & Nguyen, 2020. 


IEF: fi 
^ i 


| 
“0 
UU 
| | ai | 
-1 


Utility: High fidelity explanations can mislead 


In a bail adjudication task, misleading high-fidelity explanations 
improve end-user (domain experts) trust. 


True Classifier relies on race 


If Race x African American: 
If Prior-Felony = Yes and Crime-Status = Active, then Risky 
If Prior-Convictions = 0, then Not Risky 


If Race = African American: 
If Pays-rent - No and Gender - Male, then Risky 
If Lives-with-Partner = No and College = No, then Risky 
If Age 235 and Has-Kids - Yes, then Not Risky 
If Wages 270K, then Not Risky 


Default: Not Risky 


Lakkaraju & Bastani 2019. 220 
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Utility: High fidelity explanations can mislead 


In a bail adjudication task, misleading high-fidelity explanations 
improve end-user (domain experts) trust. 


True Classifier relies on race High fidelity ‘misleading’ explanation 
If Race z African American: If Current-Offense = Felony: | 
If Prior-Felony = Yes and Crime-Status = Active, then Risky If Prior-FTA = Yes and Prior-Arrests > 1, then Risky . 
If Prior-Convictions = 0, then Not Risky If Crime-Status = Active and Owns-House = No and Has-Kids = No, then Risky 


If Prior-Convictions = 0 and College = Yes and Owns-House = Yes, then Not Risky 


If Current-Offense = Misdemeanor and Prior-Arrests > 1: 


If Race = African American: If Prior-Jail-Incarcerations = Yes, then Risky 
If Pays-rent = No and Gender = Male, then Risky If Has-Kids = Yes and Married = Yes and Owns-House = Yes, then Not Risky 
If Lives-with-Partner = No and College = No, then Risky If Lives-with-Partner = Yes and College = Yes and Pays-Rent = Yes, then Not Risky 
If Age 235 and Has-Kids = Yes, then Not Risky l 
If Wages >70K, then Not Risky If Current-Offense = Misdemeanor and Prior-Arrests < 1: 


If Has-Kids = No and Owns-House = No and Prior-Jail-Incarcerations = Yes, then Risky 
If Age > 50 and Has-Kids = Yes and Prior-FTA = No, then Not Risky 


Default: Not Risky Default: Not Risky 
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[Kaur et. al., 2020; Bucinca et. al., 2020] 


Utility: Post hoc Explanations Instill Over Trust 


Domain experts and end users seem to be over trusting 
explanations & the underlying models based on explanations 


: Data scientists over trusted explanations without even comprehending 
them -- "Participants trusted the tools because of their visualizations and 
their public availability" 


[Kaur et. al., 2020] 


Responses from Data Scientists Using Explainability Tools 
(GAM and SHAP) 


"[ didn't fully grasp what SHAP values were. This is 
a pretty popular tool and I get the log-odds concept in 
general. I figure they were showing SHAP values for a 


reason. Maybe it's easier to judge relationships using 
log-odds instead of predicted value. Anyway, so it made 
sense I suppose.” (P6, SHAP) 


"[The tool] assigns a value that is important to know, but 
it's showing that in a way that makes you misinterpret that 


value. Now I want to go back and check all my answers"... 


[later] "Okay, so, it’s not showing me a whole lot more 
than what I can infer on my own. Now I’m thinking... is 
this an ‘interpretability tool’ ?” (P4, SHAP) 


"Age 38 seems to have the highest positive influence 
on income based on the plot. Not sure why, but the 
explanation clearly shows it... makes sense.” (P9, GAMs) 


"[The tool] shows visualizations of ML models, which is not 
something anything else I have worked with has done. It's very 
transparent, and that makes me trust it more" (P9, GAMs). 
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Utility: Explanations for Debugging 


In a housing price prediction task, Amazon mechanical turkers are 
unable to use linear model coefficients to diagnose model mistakes. 


Attention: This apartment has an unusual combination of # Bedrooms and # Bathrooms. 


Properties | 


| 
| # Bedrooms 
| 


X $350,000 


| 
| # Bathrooms 
| 


| Square footage x $1000 
| Total rooms 

| | Model's prediction | 
| Days on the market Ipod 
| | $1,500,000 | 


| Maintenance fee ($) 


| | 
| . " [353 L4 
| Subway distance (miles) | 0.121 | | 


| — 
| School distance (miles) | 0.101 | | 


$(-260,000) 


Adjustment ————————————» 


Please take the unusual configuration of this apartment into consideration when making predictions. 


Poursabzi-Sangdeh et. al. 2019 


Utility: Explanations for Debugging 


In a dog breeds classification task, users familiar with machine 
learning rely on labels, instead of saliency maps, for diagnosing 
model errors. 


Adebayo et. al., 2020. 225 
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Utility: Explanations for Debugging 


In a dog breeds classification task, users familiar with machine 
learning rely on labels, instead of saliency maps, for diagnosing 
model errors. 
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Conflicting Evidence on Utility of Explanations 


e Mixed evidence: 
e simulation and benchmark studies show that 
explanations are useful for debugging; 
e however, recent user studies show limited utility in 
practice. 


Conflicting Evidence on Utility of Explanations 


e Mixed evidence: 
e simulation and benchmark studies show that 
explanations are useful for debugging; 
e however, recent user studies show limited utility in 
practice. 


e Rigorous user studies and pilots with end-users can 
continue to help provide feedback to researchers on what 
to address (see: Algaraawi et. al. 2020, Bhatt et. al. 2020 & 
Kaur et. al. 2020). 


Utility: Disagreement Problem in XAI 


Study to understand: 


- if and how often feature attribution based explanation methods disagree 
with each other in practice 


: What constitutes disagreement between these explanations, and how to 
formalize the notion of explanation disagreement based on practitioner 
inputs? 


How do practitioners resolve explanation disagreement? 


Krishna and Han et. al., 2022 229 
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Practitioner Inputs on Explanation Disagreement 


- 30 minute semi-structured interviews with 25 data scientists 


- 84% of participants said they often encountered disagreement 
between explanation methods 


- Characterizing disagreement: 
: Top features are different 
- Ordering among top features is different 
: Direction of top feature contributions is different 
: Relative ordering of features of interest is different 


How do Practitioners Resolve Disagreements? 


- Online user study where 25 users were shown explanations that 
disagree and asked to make a choice, and explain why 


- Practitioners are choosing methods due to: 
- Associated theory or publication time (33%) 
- Explanations matching human intuition better (32%) 


- Type of data (23%) 
* E.g., LIME or SHAP are better for tabular data 


How do Practitioners Resolve Disagreements? 


Algorithm Reasons that algorithm was chosen in disagreement 

e [36%] SHAP is better for tabular data ( "SHAP is more commonly used 
KernelSHAP [than Gradient] for tabular data") 
e [25%] SHAP is more familiar ("More information present + more 
familiarity") 
e [14%] SHAP is a better algorithm overall ( "SHAP seems more method- 
ical than LIME", "SHAP is a more rigorous approach [than LIME] in 
theory") 
e [33%] SmoothGrad paper is newer or better ( "Smooth Grad is apparently 
more robust", "SmoothGrad is often considered improved verison of grad") 
e [58%] Reasons based on the explainability map shown ( "directionality 
of the attributions ... [agree] with intuition", "gradient has unstability 
problems [, so] smoothgrad") 
e [54%] LIME is better for tabular data ("J use LIME for structured 
data. ") 
e [15%] LIME is more familiar/easier to interpret ("I am more familiar 
with LIME", "LIME is easy to interpret") 
Integrated e [86%] Integrated Gradients paper is better ("IG came after gradi- 
Gradients ents and paper shows improvements, "integrated gradients paper showed 

improvements [over Gradient x Input]" 


SmoothGrad 


Empirical Analysis: Summary 
Faithfulness/Fidelity 


m Some explanation methods do not 'reflect the underlying model. 


Fragility 


m Post-hoc explanations can be easily manipulated. 


Stability 


m Slight changes to inputs can cause large changes in explanations. 


Useful in practice? 


Theoretically Analyzing Interpretable 
Models 


Two main classes of theoretical results: 


Interpretable models learned using certain algorithms are certifiably 
optimal 
E.g., rule lists (Angelino et. al., 2018) 


No accuracy-interpretability tradeoffs in certain settings 


E.g., reinforcement learning for mazes (Mansour et. al., 2022) 


[Garreau et. al., 2020] 


Theoretical Analysis of Tabular LIME w.r.t. Linear Models 


* Theoretical analysis of LIME as | 
* "black box” is a linear model 


* data is tabular and discretized 02 | 


* Obtained closed-form solutions of the average coefficients of the 
(explanation output by LIME) 


3 4 5 
"surrogate" model 


* The coefficients obtained are proportional to the gradient of the function to be 
explained 


* Local error of surrogate model is bounded away from zero with high probability 
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[Agarwal et. al., 2020] 


Unification and Robustness of LIME and SmoothGrad 


* C-LIME (a continuous variant of LIME) and SmoothGrad converge to the same 
explanation in expectation 


* At expectation, the resulting explanations are provably robust according to the notion 
of Lipschitz continuity 


* Finite sample complexity bounds for the number of perturbed samples required for 
SmoothGrad and C-LIME to converge to their expected output 


[Han et. al., 2022] 


Function Approximation Perspective to Characterizing 
Post hoc Explanation Methods 


- Various feature attribution methods (e.g., LIME, C-LIME, 
KernelSHAP, Occlusion, Vanilla Gradients, Gradient times Input, 
SmoothGrad, Integrated Gradients) are essentially local linear 
function approximations. 


g — argmin [E 
gegG 6-2 


Af, g, Xo, €) 


. But... 


[Han et. al., 2022] 


Function Approximation Perspective to Characterizing 
Post hoc Explanation Methods 


- But, they adopt different loss functions, and local neighborhoods 


Explanation Method | Local Neighborhood Z around xo | Loss Function £ 
C-LIME xo + €; £(c RÅ)  Normal(0, o?) Squared Error 
SmoothGrad xo + £; E(€ RY) ~ Normal(0, c?) Gradient Matching 
Vanilla Gradients xo + £; £(c RY ~ Normal(0, o?), o — 0 Gradient Matching 


Integrated Gradients Exo; £(€ R) ~ Uniform(0, 1) Gradient Matching 
Gradients x Input Exo; €(€ R) ~ Uniform(a, 1),a > 1 Gradient Matching 


LIME xo © €; E(€ 10, 1)4) ~ Exponential kernel Squared Error 
KernelSHAP xo © €; E(€ 10,11) ~ Shapley kernel Squared Error 
Occlusion xo © £; E(€ (0,1) 4) ~ Random one-hot vectors Squared Error 
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Function Approximation Perspective to 
Characterizing Post hoc Explanation Methods 


- No Free Lunch Theorem for Explanation Methods: No single 
method can perform optimally across all neighborhoods 


Theorem 3 (No Free Lunch for Explanation Methods). Consider the scenario where we explain a 
black-box model f around point xo using an interpretable model g from class G and a valid loss func- 
tion £ where the distance between f and G is given by d(f,G) = mingeg maxxex /(f, g, 0, x). Then, 
for any explanation g* on a neighborhood distribution £q ~ Z4 such that maxe, £( f, g*, xo, £1) <6 
we can always find another neighborhood £2 ~ Za such that max, £( f, g*, xo,£2) > d(f, €). 


Agenda 

Inherently Interpretable Models 

Post hoc Explanation Methods 

Evaluating Model Interpretations/Explanations 

Empirically & Theoretically Analyzing Interpretations/Explanations 


Future of Model Understanding 
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Methods for More Reliable Post hoc 
Explanations 


1 We have seen several limitations in the behavior of post hoc 
explanation methods - e.g., unstable, inconsistent, fragile, not 
faithful 


3 While there are already attempts to address some of these 
limitations, more work is needed 


Alvarez-Melis, 2018 


Challenges with LIME: Stability 


- Perturbation approaches like LIME/SHAP are unstable 


method 
[E LIME 
ENN SHAP 


Lipshitz Estimate 


glass wine ionosphere leukemia 
Dataset 
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Challenges with LIME: Consistency 


(a) linear, many samples (b) linear, fewer samples (c) nonlinear, many samples (d) nonlinear, fewer samples 


Many 7 250 perturbations; Few = 25 perturbations; 


When you repeatedly run LIME on the same instance, 
you get different explanations (blue region) 


Slack et. al., 2020 
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Challenges with LIME: Consistency 


Challenges with LIME: Scalability 


- Querying complex models (e.g., Inception Network, ResNet, 
AlexNet) repeatedly for labels can be computationally 
prohibitive 


- Large number of perturbations || Large number of model 
queries 


Generating reliable explanations using LIME can be 


computationally expensive! 


Explanations with Guarantees: BayesLIME 
and BayesSHAP 


- Intuition: Instead of point estimates of feature importances, 
define these as distributions 


0.05 0.10 0.15 0.20 0.25 
Absolute Feature Importance 


0.05 0.10 0.15 0.20 0.25 
Absolute Feature Importance 


(a) Explanation computed with 100 perturbations (b) Explanation for the same instance with 2000 perturbations 


BayesLIME and BayesSHAP 


- Construct a Bayesian locally weighted regression that can 
accommodate LIME/SHAP weighting functions 


2 
O 
ulzd,e ~ Hlz+e | e» N(0 ——) 
Pd Få Ax (Z) 
Black Box 

Predictions Feature Perturbations VA 
Importances Weighting 
Function 


lo^ ~ N(0, 01) o” ~ Inv- Y" (no, 02). 
X 0 


Priors on feature importances and feature importance 
uncertainty 


BayesLIME and BayesSHAP: Inference 


- Conjugacy results in following posteriors 


o^ |Z, Y~ Scaled-Inv- y? : +N, 


noo + Ns? 
no + N 


plo”, Z,Y ~ Normal(ó, Vo") 


- We can compute all parameters in closed form 


"M T.. 
These are the same 9 7 Vo (Z diag(IIx(Z))Y) 


t di (Ta > 
LIME & SHAP! - Vp = z diag (Ix(Z)] Z +I) 


a x [tj - Zó)! diag(II(Z))(Y - ZH) + ġġ 


Estimating the Required Number of Perturbations 


I need an explanation where true 
feature importance lies within +0.5 
of estimated values with 9596 
confidence 


THEOREM 3.3. Given S seed perturbations, the number of addi- 
tional perturbations required (G) to achieve a credible interval width 
W offeature importance for a data point x at user-specified confidence 
level a can be computed as: 


: . 2 

Estimate required number of GF a 3x) -- 15 å (9) 
perturbations for user specified EN 2 | 

uncertainty level. SM 


where ñs is the average proximity zx (z) for the S perturbations, r 
is the empirical sum of squared errors (SSE) between the black box 
and local linear model predictions, weighted by zx (z), as in (7), and 
7! (æ) is the two-tailed inverse normal CDF at confidence level a. 
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Improving Efficiency: Focused Sampling 


- Instead of sampling perturbations randomly and querying the 
black box, choose points the learning algorithm is most uncertain 
about and only query their labels from the black box. 


This approach allows us to construct explanations with 


user defined levels of confidence in an efficient manner! 


Other Questions 


- Can we construct post hoc explanations that are provably robust 
to various adversarial attacks discussed earlier? 


- Can we construct post hoc explanations that can guarantee 
faithfulness, stability, and fairness simultaneously? 
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Theoretical Analysis of the Behavior of 
Explanations/Models 


* We discussed some of the recent theoretical results earlier. Despite these, several 
important questions remain unanswered 


e Can we characterize the conditions under which each post hoc explanation method 
(un)successfully captures the behavior of the underlying model? 


* Giventhe properties of the underlying model, data distribution, can we theoretically 
determine which explanation method should be employed? 


* Canwe theoretically analyze the nature of the prototypes/attention weights learned by 
deep nets with added layers? When are these meaningful/when are they spurious? 
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Empirical Analysis of Correctness/Utility 


While there is already a lot of work on empirical analysis of correctness/utility for 
post hoc explanation methods, there is still no clear characterization of which 
methods (if any) are correct/useful under what conditions. 


There is even less work on the empirical analysis of the correctness/utility of the 
interpretations generated by inherently interpretable models. For instance, are the 
prototypes generated by adding prototype layers correct/meaningful? Can they be 
leveraged in any real world applications? What about attention weights? 
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Characterizing Similarities and Differences 


- Several post hoc explanation methods exist which employ diverse 
algorithms and definitions of what constitutes an explanation, under what 
conditions do these methods generate similar outputs (e.g., top K features) ? 


e Multiple interpretable models which output natural/synthetic prototypes 
(e.g., Li et. al, Chen et. al. etc.). When do they generate similar answers and 
why? 


[Coppens et. al., 2019, Amir et. al. 2018] 
[Ying et. al., 2019] 


Model Understanding Beyond 
Classification 


How to think about interpretability in the context of large language models and 
foundation models? What is even feasible here? 


Already active work on interpretability in RL and GNNs. However, very little 
research on analyzing the correctness/utility of these explanations. 


Given that primitive interpretable models/post hoc explanations suffer from so 
many limitations, how to ensure explanations for more complex models are 
reliable? 
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[ntersections with Model Robustness 


* Are inherently interpretable models with prototype/attention layers more robust 
than those without these layers? If so, why? 


* Are there any inherent trade-offs between (certain kinds of) model interpretability 
and model robustness? Or do these aspects reinforce each other? 


Prior works show that counterfactual explanation generation algorithms output 
adversarial examples. What is the impact of adversarially robust models on these 
explanations? [Pawelczyk et. al., 2022] 
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Intersections with Model Fairness 


It is often hypothesized that model interpretations and explanations help unearth unfairness biases 
of underlying models. However, there is little to no empirical research demonstrating this. 


Conducting more empirical evaluations and user studies to determine how interpetations and 
explanations can complement statistical notions of fairness in identifying racial/gender biases 


How does the fairness (statistical) of inherently interpretable models compare with that of vanilla 
models? Are there any inherent trade-offs between (certain kinds of) model interpretability and 
model fairness? Or do these aspects reinforce each other? 
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[Harder et. al., 2020; Patel et. al. 2020] 
Intersections with Differential Privacy 


e Model interpretations and explanations could potentially expose 
sensitive information from the datasets. 


e Little to no research on the privacy implications of interpretable 
models and/or explanations. What kinds of privacy attacks (e.g., 
membership inference, model inversion etc.) are enabled? 


e Do differentially private models help thwart these attacks?If so, 
under what conditions? Should we construct differentially 
private explanations? 
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[Lakkaraju et. al., 2022, Slack et. al., 2022] 


New Interfaces, Tools, Benchmarks for 
Model Understanding 


e Can we construct more interactive interfaces for end users to 
engage with models? What would be the nature of such 
interactions? [demo | 


e As model interpretations and explanations are employed in 
different settings, we need to develop new benchmarks and tools 
for enabling comparison of faithfulness, stability, fairness, utility 
of various methods. How to enable that? 


Some Parting Thoughts.. 


There has been renewed interest in model understanding over the past 
half decade, thanks to ML models being deployed in healthcare and other 
high-stakes settings 


As ML models continue to get increasingly complex and they continue to 
find more applications, the need for model understanding is only going 
to raise 


Lots of interesting and open problems waiting to be solved 


You can approach the field of XAI from diverse perspectives: theory, 
algorithms, HCI, or interdisciplinary research - there is room for 
everyone! 


Thank You! 


* Acknowledgements: Special thanks to Julius Adebayo, Chirag Agarwal, Shalmali Joshi, and Sameer 
Singh for co-developing and co-presenting sub-parts of this tutorial at NeurIPS, AAAI, and FAccT 
conferences. 


* Email: hlakkaraju@hbs.edu; hlakkarajuQseas.harvard.edu; 
e Course on interpretability and explainability: https: //interpretable-ml-class.github.io / 


e More tutorials on interpretability and explainability: https://explainml-tutorial.github.io/ 


e Trustworthy ML Initiative: https:/ /www.trustworthyml.org/ 


* Lots ofresources and seminar series on topics related to explainability, fairness, adversarial 
robustness, differential privacy, causality etc. 
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TalkToModel 


Hello Hima, I'm a machine learning model trained to predict 
whether someone has diabetes. 


Let's get started. Ask me something! 


I 
Enter your command! Use the 7 arrow and + arrow to cycle previous commands. 


Help me generate a question about... “> 


